Proper versus Ad-Hoc MDL Principle for Polynomial Regression
نویسندگان
چکیده
The paper deals with the task of polynomial regression, i.e., inducing polynomial that can be used to predict a chosen dependent variable based on the values of independent ones. As in other induction tasks, there is a trade-off between the complexity of the induced polynomial and its predictive error. One of the approaches for searching an optimal trade-off is the Minimal Description Length principle (MDL). In our previous papers on polynomial regression, we proposed an ad-hoc MDL principle. The focus in this paper is on developing a proper encoding schema for polynomials that leads to a proper MDL principle for polynomial regression. We implemented the developed MDL principle as a search heuristic in CIPER, an algorithm for inducing polynomials from data. We present an empirical comparison between the heuristics based on the ad-hoc and the proper MDL principle. The results show that proper MDL principle leads to simpler polynomials with comparable predictive error. Finally, we also propose a lower bound for the proper MDL principle that allows branch-and-bound pruning of the CIPER search space and evaluate the benefits of pruning.
منابع مشابه
Layered Representation of Motion Video using Robust Maximum - LikelihoodEstimation of Mixture Models and MDL
Representing and modeling the motion and spatial support of multiple objects and surfaces from motion video sequences is an important intermediate step towards dynamic image understanding. One such representation, called layered representation, has recently been proposed. Although a number of algorithms have been developed for computing these representations, there has not been a consolidated e...
متن کاملLayered Representation of Motion Video Using Robust Maximum-Likelihood Estimation of Mixture Models and MDL Encoding
Representing and modeling the motion and spatial support of multiple objects and surfaces from motion video sequences is an important intermediate step towards dynamic image understanding. One such representation, called layered representation, has recently been proposed. Although a number of algorithms have been developed for computing these representations, there has not been a consolidated e...
متن کاملModel Selection with the Loss Rank Principle
A key issue in statistics and machine learning is to automatically select the “right” model complexity, e.g., the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with polynomials. We suggest a novel principle the Loss Rank Principle (LoRP) for model selection in regression and classification. It is based on the loss rank, whi...
متن کاملThe Loss Rank Principle for Model Selection
We introduce a new principle for model selection in regression and classification. Many regression models are controlled by some smoothness or flexibility or complexity parameter c, e.g. the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with polynomials. Let f̂ c D be the (best) regressor of complexity c on data D. A more fl...
متن کاملInference of Gene Regulatory Networks Based on a Universal Minimum Description Length
The Boolean network paradigm is a simple and effective way to interpret genomic systems, but discovering the structure of these networks remains a difficult task. The minimum description length (MDL) principle has already been used for inferring genetic regulatory networks from time-series expression data and has proven useful for recovering the directed connections in Boolean networks. However...
متن کامل